Face and Facial Keypoint detection¶

After you've trained a neural network to detect facial keypoints, you can then apply this network to any image that includes faces. The neural network expects a Tensor of a certain size as input and, so, to detect any face, you'll first have to do some pre-processing.

  1. Detect all the faces in an image using a face detector (we'll be using a Haar Cascade detector in this notebook).
  2. Pre-process those face images so that they are grayscale, and transformed to a Tensor of the input size that your net expects. This step will be similar to the data_transform you created and applied in Notebook 2, whose job was tp rescale, normalize, and turn any image into a Tensor to be accepted as input to your CNN.
  3. Use your trained model to detect facial keypoints on the image.

In the next python cell we load in required libraries for this section of the project.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

Select an image¶

Select an image to perform facial keypoint detection on; you can select any image of faces in the images/ directory.

In [2]:
import cv2
# load in color image for face detection
image = cv2.imread('images/obamas.jpg')
#image = cv2.imread('images/mona_lisa.jpg')
#image = cv2.imread('images/the_beatles.jpg')


# switch red and blue color channels 
# --> by default OpenCV assumes BLUE comes first, not RED as in many images
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# plot the image
fig = plt.figure(figsize=(9,9))
plt.imshow(image)
Out[2]:
<matplotlib.image.AxesImage at 0x7ff4a6f5d690>

Detect all faces in an image¶

Next, you'll use one of OpenCV's pre-trained Haar Cascade classifiers, all of which can be found in the detector_architectures/ directory, to find any faces in your selected image.

In the code below, we loop over each face in the original image and draw a red square on each face (in a copy of the original image, so as not to modify the original). You can even add eye detections as an optional exercise in using Haar detectors.

An example of face detection on a variety of images is shown below.

In [3]:
def run_haar(image):
    # load in a haar cascade classifier for detecting frontal faces
    face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
    
    # run the detector
    # the output here is an array of detections; the corners of each detection box
    # if necessary, modify these parameters until you successfully identify every face in a given image
    faces = face_cascade.detectMultiScale(image, 1.2, 2)
    return faces

faces = run_haar(image)

# make a copy of the original image to plot detections on
image_with_detections = image.copy()

# loop over the detected faces, mark the image where each face is found
for (x,y,w,h) in faces:
    # draw a rectangle around each detected face
    # you may also need to change the width of the rectangle drawn depending on image resolution
    cv2.rectangle(image_with_detections,(x,y),(x+w,y+h),(255,0,0),3) 

fig = plt.figure(figsize=(9,9))

plt.imshow(image_with_detections)
Out[3]:
<matplotlib.image.AxesImage at 0x7ff4a6f9b150>

Loading in a trained model¶

Once you have an image to work with (and, again, you can select any image of faces in the images/ directory), the next step is to pre-process that image and feed it into your CNN facial keypoint detector.

First, load your best model by its filename.

In [4]:
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils, models
#from models import Net

#net = Net()

## DONE: load the best saved model parameters (by your path name)
## You'll need to un-comment the line below and add the correct name for *your* saved model
# net.load_state_dict(torch.load('saved_models/keypoints_model_1.pt'))

## print out your net and prepare it for testing (uncomment the line below)
# net.eval()

Choose one of the following working models.

In [ ]:
import torch.nn as nn
net, model_name = models.resnet18(weights=None), 'resnet18'
net.fc=nn.Linear(net.fc.in_features, 16*2)
In [5]:
# Best performing model from (my) human perception.
from fkpmodels.naimishnet import YaNaimishNet2
net, model_name = YaNaimishNet2(), 'YaNaimishNet2'
In [ ]:
from fkpmodels.naimishnet import YaNaimishNet3
net, model_name = YaNaimishNet3(), 'YaNaimishNet3'
In [6]:
model_dir = 'saved_models/'
In [7]:
# Simply load the model for testing after the newtwork architecture has been choosen above.
checkpoint = torch.load(model_dir+model_name+'.pt')
net.load_state_dict(checkpoint)
Out[7]:
<All keys matched successfully>
In [8]:
from data_load import Rescale, Normalize, ToTensor, ToTensorRGB, FaceCrop
In [9]:
data_transform = transforms.Compose([Rescale(128),
                                     ToTensorRGB()])

Keypoint detection¶

Now, we'll loop over each detected face in an image (again!) only this time, you'll transform those faces in Tensors that your CNN can accept as input images.

DONE: Transform each detected face into an input Tensor¶

You'll need to perform the following steps for each detected face:

  1. Convert the face from RGB to grayscale. --> Not done. Using RGB instead as described in notebook 2.
  2. Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
  3. Rescale the detected face to be the expected square size for your CNN (224x224, suggested). --> Rescaling to 128x128.
  4. Reshape the numpy image into a torch image.

You may find it useful to consult to transformation code in data_load.py to help you perform these processing steps.

DONE: Detect and display the predicted keypoints¶

After each face has been appropriately converted into an input Tensor for your network to see as input, you'll wrap that Tensor in a Variable() and can apply your net to each face. The ouput should be the predicted the facial keypoints. These keypoints will need to be "un-normalized" for display, and you may find it helpful to write a helper function like show_keypoints. You should end up with an image like the following with facial keypoints that closely match the facial features on each individual face:

In [10]:
from data_load import norm_means, norm_std
In [11]:
import torch
from torchvision import transforms, models
In [12]:
def imshow(image, ax=None, title=None, normalize=True):
    """Imshow for Tensor."""
    if ax is None:
        fig, ax = plt.subplots()
    image = image.numpy().transpose((1, 2, 0))

    if normalize:
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        image = std * image + mean
        image = np.clip(image, 0, 1)

    plt.imshow(image)
    return image


def plot_keypoints(key_pts):
    def plot_keypoints_single(key_pts, offset=(0, 0), wh=128):
        key_pts_copy = np.copy(key_pts)
        # Invert keypoint normalization in data_load.py:Normalize: key_pts_copy = (key_pts_copy - 100)/50.0
        key_pts_copy = key_pts_copy*50.0 + 100.0
        key_pts_copy = key_pts_copy * wh / 128.0 + offset
        plt.scatter(key_pts_copy[:, 0], key_pts_copy[:, 1], s=20, marker='.', c='m')
    if isinstance(key_pts, list):
        for face in key_pts:
            plot_keypoints_single(face['pts'], offset=tuple(face['xywh'][0:2]), wh=face['xywh'][3])
    else:
        plot_keypoints_single(key_pts)


def show_keypoints(image, key_pts, ax=None, normalize=True):
    """Show image with keypoints"""
    if ax is None:
        fig, ax = plt.subplots()

    if isinstance(image, torch.Tensor):
        image = image.numpy().transpose((1, 2, 0))

        if normalize:
            mean = np.array([0.485, 0.456, 0.406])
            std = np.array([0.229, 0.224, 0.225])
            image = std * image + mean
            image = np.clip(image, 0, 1)

    #plt.imshow(image, cmap='gray')
    plt.imshow(image)
    plot_keypoints(key_pts)
In [13]:
# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

net.to(device);
In [14]:
#face = faces[435]
#face = faces[0]
In [15]:
def detect_keypoints(image, faces, plot_faces=False):
    image_copy = np.copy(image)

    normalizer = transforms.Normalize(norm_means, norm_std)
    resizer = transforms.Resize(128, antialias=True)

    net.eval()
    # loop over the detected faces from your haar cascade
    #for (x,y,w,h) in np.array([face]):
    output_pts_list = []
    for (x,y,w,h) in faces:
        # Select the region of interest that is the face in the image 
        roi = np.copy(image_copy[y:y+h, x:x+w])
    
        ## DONE: Reshape the numpy image shape (H x W x C) into a torch image shape (C x H x W)
        roi = roi.transpose((2, 0, 1)).astype('float')/255.0
        roi = np.asarray([roi])
        roi_tt = torch.from_numpy(roi)
        # Convert BGR image to RGB image
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        ## DONE: Normalize the grayscale/color image so that its color range falls in [0,1] instead of [0,255]
        roi_tt = normalizer.forward(roi_tt)
        ## DONE: Rescale the detected face to be the expected square size for your CNN (128x128)
        roi_tt = resizer.forward(roi_tt)
        # convert images to FloatTensors
        roi_tt = roi_tt.type(torch.FloatTensor)
        ## DONE: Make facial keypoint predictions using your loaded, trained network 
        ## perform a forward pass to get the predicted facial keypoints
        roi_tt = roi_tt.to(device)
        output_pts = net(roi_tt)
        # reshape to batch_size x 16 x 2 pts
        output_pts = output_pts.view(output_pts.size()[0], 16, -1)

        output_pts_cpu = output_pts.cpu()[0].detach()
        ## DONE: Display each detected face and the corresponding keypoints
        if plot_faces:
            show_keypoints(roi_tt.cpu()[0], output_pts_cpu)
        output_pts_list.append({
            'xywh': [x, y, w, h],
            'pts': np.copy(output_pts_cpu),
        })
    return output_pts_list


detect_keypoints(image, faces, plot_faces=True)
Out[15]:
[{'xywh': [371, 145, 160, 160],
  'pts': array([[-1.522212  , -1.267344  ],
         [-1.2008464 , -1.2430794 ],
         [-1.0480812 , -1.1894549 ],
         [-0.43868402, -1.2066588 ],
         [-0.2878859 , -1.2646968 ],
         [ 0.08023396, -1.291009  ],
         [-0.7343752 , -0.89242357],
         [-0.7339932 , -0.3144935 ],
         [-1.4167624 , -0.9862068 ],
         [-1.0191752 , -0.93417287],
         [-0.36299047, -0.9454051 ],
         [ 0.06349718, -0.9744376 ],
         [-1.1558111 ,  0.00999726],
         [-0.68826854, -0.03875143],
         [-0.14182904, -0.00713402],
         [-0.8135689 ,  0.28752077]], dtype=float32)},
 {'xywh': [179, 74, 174, 174],
  'pts': array([[-1.4557476 , -1.2215672 ],
         [-1.0846001 , -1.2590228 ],
         [-0.9245182 , -1.223072  ],
         [-0.3487623 , -1.2162797 ],
         [-0.20208031, -1.2456094 ],
         [ 0.09122492, -1.1827825 ],
         [-0.6343766 , -0.9243807 ],
         [-0.57810414, -0.38074264],
         [-1.3935668 , -0.9322542 ],
         [-0.98957926, -0.9124063 ],
         [-0.36637017, -0.8909208 ],
         [ 0.0135756 , -0.8855344 ],
         [-1.2200344 ,  0.07619148],
         [-0.63269943, -0.03762779],
         [-0.1902846 ,  0.09298475],
         [-0.80537367,  0.33837438]], dtype=float32)}]
In [18]:
# DONE: Display orginal images with the keypoints of all faces
def detect_and_show_keypoints(image_path, plot_faces=False):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    faces = run_haar(image)
    output_pts_list = detect_keypoints(image, faces, plot_faces=plot_faces)
    fig = plt.figure(figsize=(9,9))
    show_keypoints(image, output_pts_list, ax=fig)


detect_and_show_keypoints('images/obamas.jpg')
In [19]:
detect_and_show_keypoints('images/mona_lisa.jpg')
In [21]:
detect_and_show_keypoints('images/the_beatles.jpg')
In [22]:
# https://commons.wikimedia.org/wiki/File:Scientists_for_Future_2019-03-12_group_photograph_01.jpg
detect_and_show_keypoints('images/Scientists_for_Future_2019-03-12_group_photograph_01.jpg', plot_faces=True)
In [ ]: